Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1810 +/- ##
=======================================
Coverage 99.81% 99.81%
=======================================
Files 142 142
Lines 4873 4873
Branches 472 472
=======================================
Hits 4864 4864
Misses 6 6
Partials 3 3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
tagging you @nspope -- if you could help to review this that would be amazing |
|
@petrelharp -- heads up, it looks like the pyslim docs site is returning a 404 |
|
|
||
| # --- Validate against stdpopsim catalog --- | ||
| echo "Validating against stdpopsim catalog..." | ||
| VALIDATION=$(python3 << PYEOF |
There was a problem hiding this comment.
i'm probably being dense here, but why do we need to recreate this validation here, if the same code is being run as an action? Shouldn't it be one or the other?
There was a problem hiding this comment.
No, this is a fair point. My thinking was the local validation gives fast feedback before creating a draft release and triggering the workflow. The workflow validation is the security gate (since someone could bypass the script). But it is duplicated code. This is the "belt and suspenders" defensive approach I was talking to you about.
We could remove the local validation and let the workflow be the single source of truth, but this would be a slower feedback loop
There was a problem hiding this comment.
I vote for checking this twice!
There was a problem hiding this comment.
For instance, it's relatively easy for the workflow to be accidentally turned off; doing it locally is I think great.
| # ./maintenance/upload_to_s3.sh --dry-run data/test.tar.gz HomSap genetic_map HapMapII_GRCh38 | ||
| # | ||
| # The script looks up the resource in the stdpopsim catalog to find the | ||
| # expected S3 URL and SHA256, verifies the local file matches, then uploads |
There was a problem hiding this comment.
"verifies the local file matches" makes it seem like the file has already been uploaded; but that can't be the case right?
I think it'd help clarify if you spelled out the workflow on a bit more, something like
- Create a catalogue entry for the new resource, in a feature branch
- Use this script and point it towards the local path to the resource, it will validate it against the feature branch and upload it to AWS
- Merge the feature branch
Maybe this should go in the developer docs as well?
There was a problem hiding this comment.
yeah this comment is ambiguous.... agreed this should be changed and added to the dev docs
| echo "Or watch the latest run:" | ||
| echo " gh run watch --repo $REPO \$(gh run list --repo $REPO --workflow $WORKFLOW --limit 1 --json databaseId --jq '.[0].databaseId')" | ||
| echo "" | ||
| echo "Final S3 URL: $S3_URL" |
There was a problem hiding this comment.
Do you think it's worth uploading some metadata or otherwise trying to version what's on AWS? So that, if someone uses this script, we'll have a record of who/when/what? Or, I guess this would be tracked by the feature branch that adds the new resource?
There was a problem hiding this comment.
good idea! I looked in to this quickly and see that we can include metadata with a call like:
aws s3 cp file.tar.gz s3://bucket/path.tar.gz \
--metadata '{"uploaded-by":"andrewkern","pr":"1810","timestamp":"2026-03-05"}'There was a problem hiding this comment.
Where does that metadata go? I was imagining @nspope was suggesting also uploading a JSON file.
| expected_sha256 = resource.intervals_sha256 | ||
|
|
||
| # Extract S3 path from URL (handles both s3-us-west-2 and s3.us-west-2 styles) | ||
| m = re.match(r"https://stdpopsim\.s3[.-]us-west-2\.amazonaws\.com/(.*)", s3_url) |
There was a problem hiding this comment.
I'm unclear why this is necessary?
|
|
||
| echo "" | ||
| if [ "$DRY_RUN" = true ]; then | ||
| echo "Dry-run workflow triggered. It will validate everything but skip the S3 upload." |
There was a problem hiding this comment.
should the dry-run also skip the release?
| --repo "$REPO" \ | ||
| --draft \ | ||
| --title "S3 upload: $SPECIES_ID/$RESOURCE_TYPE/$RESOURCE_ID" \ | ||
| --notes "Temporary draft release for S3 upload. Will be cleaned up automatically." \ |
There was a problem hiding this comment.
Where does the advertised cleaning up happen?
There was a problem hiding this comment.
ah I see, here. But it won't get cleaned up if someone doesn't run the aws script. I'm a little unclear about what the "release" actually is, though.
| exit 1 | ||
| fi | ||
|
|
||
| # S3 destination must end with .tar.gz or .tgz |
There was a problem hiding this comment.
How about we also validate it's of the form _vN.tar.gz (as required)?
| run: | | ||
| DEST="${{ inputs.s3_destination }}" | ||
| if aws s3api head-object --bucket stdpopsim --key "$DEST" 2>/dev/null; then | ||
| echo "ERROR: s3://stdpopsim/$DEST already exists. Refusing to overwrite." |
There was a problem hiding this comment.
| echo "ERROR: s3://stdpopsim/$DEST already exists. Refusing to overwrite." | |
| echo "ERROR: s3://stdpopsim/$DEST already exists. Refusing to overwrite." | |
| echo "Do not replace or remove previously-released files!" |
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| TAG="${{ inputs.release_tag }}" | ||
| echo "Cleaning up draft release: $TAG" |
There was a problem hiding this comment.
we should at least validate the form of $TAG here to make sure this isn't deleting something it shouldn't?
| echo "SHA256: ${{ inputs.expected_sha256 }}" | ||
|
|
||
| - name: Dry run summary | ||
| if: ${{ inputs.dry_run == true }} |
There was a problem hiding this comment.
If someone runs --dry-run then won't the draft release will get deleted, so they won't be able to continue on to do the wet run?
|
Rebasing should make the pyslim error go away. |
this is a workflow that aims to securely deal with uploading new artifacts (genetic maps / annotations) to the AWS site.
maintenance/upload_to_s3.sh) that validates uploads against thecatalog before triggering the workflow
environment requiring admin approval
A maintainer runs the local script pointing at a tarball and the catalog entry it corresponds to:
./maintenance/upload_to_s3.sh data/HapMapII_GRCh38.tar.gz HomSap genetic_map HapMapII_GRCh38The script:
The workflow then:
This is in a draft state currently and will need some review